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Table Column Heading Descriptions 

Table 1 

Seq No. 

Provides the SEQ ID NO. for the listed sequences. 
Seq ID 

Arbitrary identification assigned to each contig or singleton of genomic sequence. 
Contigs designations begin with ATL8C. Singleton designations begin with ATL8S. 



Table 2 

Seq No. 

Provides the SEQ ID NO. for the listed sequences. 
15 Seq ID 

Arbitrarily assigned number for each clustered EST Arabidopsis thaliana Columbia. EST 
contig designations begin with ATCEA4. 

Table 3 
20 Seq No. 

Provides the SEQ ID NO. for the listed sequences. 

Seq ID 

Arbitrarily assigned number for each ATU (Arabidopsis thaliana umgene). 

25 

Unigene location 

Indicates genomic contigs or singletons from which the ATUs are identified and the 
location of the ATU within the contig or singleton. In cases where the first numeral is higher 
than its corresponding second numeral, the Arabidopsis thaliana protein or fragment thereof is 
30 encoded by the complement of the sequence set forth in the sequence listing. The first numeral 
separated from the contig or singleton ID by a colon represents the starting point for the codon 
for the most N-terminal (if the first number is lower than the second number) or C-terminal (if 
the first number is higher than the second number) amino acid for the protein or protein fragment 
encoded by the ATU. 

35 

Principal evidence 

A code which identifies the most reliable ATU selection method for the particular ATU. 
The selection methods are described in detail in Example 4 and briefly summarized as follows: 

40 gap2: GAP2 identified ORFs 

nap: NAP predicted ORFs 
GENSCAN: GenScan prediction 

ESTs aligned 

45 Indicates ATCEA4 EST library entry (ies) for sequence(s) which matched to the 

Arabidopsis thaliana contig query. 
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EST pet identities 

Indicates the percent identity for the EST alignment(s). 

5 Genscan prediction 

Indicates genomic contigs or singletons from which the ATUs are predicted by GenScan 
and the location of the ATU within the contig or singleton. In cases where the first numeral is 
higher than its corresponding second numeral, the Arabidopsis thaliana protein or fragment 
thereof is encoded by the complement of the sequence set forth in the sequence listing. The first 
1 0 numeral separated from the contig or singleton ID by a colon represents the starting point for the 
codon for the most N-terminal (if the first number is lower than the second number) or C- 
terminal (if the first number is higher than the second number) amino acid for the protein or 
protein fragment encoded by the ATU. 

15 Genscan weighted p score 

Indicates the weighted mean GenScan probability for correctness of the prediction. 

Ncbi gids 

Refers to National Center for Biotechnology Information GenBank Identifier number 
20 which is the best match for a given contig or singleton region from which the associated ATU 
was identified using NAP. 

Nap identities 

The percentage of identically matched nucleotides (or residues) that exist along the length 
25 of that portion of the sequence which is aligned by the BLAST comparison to generate the 
statistical scores presented. 

Nap scores 

The aat_nap score is reported by the NAP program in the AAT package. It is an 
30 alignment score in which each match and mismatch is scored based on the BLOSUM62 scoring 
matrix. 

Blastx pscores 

A score that is generated by sequence comparison of the designated clone with the 
35 designated GenBank sequence. 

Genbank description 

A description of the database entry referenced in the "NAP hit" column. 
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We claim: 

1 . A substantially purified nucleic acid molecule comprising an Arabidopsis thaliana EST 
selected from the group consisting of SEQ ID NO: 81307 through SEQ ID NO: 138060 and 
complements thereof. 

5 

2. A substantially purified nucleic acid molecule of the Arabidopsis thaliana genome 
having a nucleic acid sequence selected from the group consisting of SEQ ID NO: 138601 
through SEQ ID NO: 195836 and complements thereof. 

10 3 . The substantially purified nucleic acid molecule according to claim 2, wherein said group 
consists of SEQ ID NO: 138601 through SEQ ID NO: 162749 and complements thereof. 

4. The substantially purified nucleic acid molecule according to claim 2, wherein said group 
consists of SEQ ID NO: 162749 through SEQ ID NO: 174652 and complements thereof. 

15 

5 . The substantially purified nucleic acid molecule according to claim 2 wherein said group 
consists of SEQ ID NO: 174653 through SEQ ID NO: 195836 and complements thereof 

6. The substantially purified nucleic acid molecule according to claim 2, wherein said 
20 nucleic acid molecule further comprises nucleic acid sequences comprising one or more of a 

promoter region, regulatory region or intron region or parts of said regions. 

7. A substantially purified first nucleic acid molecule which is complementary to a second 
nucleic acid molecule of the Arabidopsis thaliana genome having a nucleic acid sequence 

25 selected from the group consisting of SEQ ID NO: 138061 through SEQ ID NO: 195835 and 
complements thereof wherein said first nucleic acid molecule and said second nucleic acid 
molecule hybridize to one another with sufficient stability to remain annealed to one another 
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under at least low stringency conditions of washing with a salt solution having a concentration of 
about 2.0 X sodium chloride/sodium citrate (SSC) at 50°C. 

8. The substantially purified first nucleic acid molecule according to claim 7, wherein said 
5 stringency conditions are at least 0.2 X SSC at 50°C. 

9. The substantially purified first nucleic acid molecule according to claim 7, wherein said 
nucleic acid molecule of the Arabidopsis thaliana genome has a sequence selected from the 
group consisting of SEQ ID NO: 138061 through SEQ ID NO: 162749 and complements 

10 thereof. 

1 0. The substantially purified first nucleic acid molecule according to claim 7, wherein said 
second nucleic acid molecule of the Arabidopsis thaliana genome has a sequence selected from 
the group consisting of SEQ ID NO: 162750 through SEQ ID NO: 174652 and complements 

15 thereof. 

11. The substantially purified first nucleic acid molecule according to claim 7, wherein said 
second nucleic acid molecule of the Arabidopsis thaliana genome has a sequence selected from 
the group consisting of SEQ ID NO: 174653 through SEQ ID NO: 195836 and complements 

20 thereof. 

12. A substantially purified first nucleic acid molecule which is homologous to a second 
nucleic acid molecule having a nucleic acid sequence selected from the group consisting of SEQ 
ID NO: 138061 through SEQ ID NO: 195836 and complements thereof, wherein at least 90% of 

25 the nucleic acid sequence of said substantially purified first nucleic acid molecule is identical to 
said second nucleic acid molecule. 
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13. The substantially purified first nucleic acid molecule according to claim 12, wherein said 
first nucleic acid sequence is 100% identical to a nucleic acid sequence of a non-Arabidopsis 
thaliana homologue. 

5 14. The substantially purified first nucleic acid molecule according to claim 12, wherein at 
least 98% of the sequence of said substantially purified nucleic acid molecule is identical to said 
second nucleic acid molecule. 

15. The substantially purified first nucleic acid molecule according to claim 12, wherein said 
10 second nucleic acid has a sequence selected from the group consisting of SEQ ID NO: 138601 

through SEQ ID NO: 162749 and complements thereof. 

16. The substantially purified first nucleic acid molecule according to claim 12, wherein said 
second nucleic acid has a sequence selected from the group consisting of SEQ ID NO: 162749 

1 5 through SEQ ID NO: 1 74652 and complements thereof. 

17. The substantially purified first nucleic acid molecule according to claim 1 2, wherein said 
second nucleic acid has a sequence selected from the group consisting of SEQ ID NO: 174653 
through SEQ ID NO: 195836 and complements thereof 

20 

18. A substantially purified polypeptide encoded by a nucleic acid sequence selected from 
the group consisting of SEQ ID NO: 138061 through SEQ ID NO: 195836. 

19. A transformed cell or organism cell or plant having an exogenous nucleic acid molecule 

25 which comprises: 

(a) a promoter region which functions in said cell to cause the production of a mRNA 

molecule; which is linked to 
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(b) a structural nucleic acid molecule which is homologous or complementary to a 
nucleic acid molecule according to claim 2, which is linked to 

(c) a 3' non-translated sequence that functions in said cell to cause termination of 
transcription and addition of polyadenylated ribonucleotides to a 3' end of said 

5 mRNA molecule. 

20. A transformed cell or organism according to claim 19 which is selected from the group 
consisting of a plant cell, plant, mammalian cell, mammal, fish cell, fish, bird cell, bird, bacterial 
cell and fungal cell and wherein said mRNA encodes a protein in said cell. 

10 

21 . A transformed cell or organism according to claim 19, wherein said structural nucleic 
acid molecule is a transcribed nucleic acid molecule with a transcribed strand and a non- 
transcribed strand and the transcribed strand specifically hybridizes to an mRNA molecule. 

15 22. Computer readable medium having recorded thereon at least 1000 of the nucleotide 
sequences depicted in SEQ ID NO: 138061 through SEQ ID NO: 195836 or complements 
thereof. 

23. Computer readable medium according to claim 22 having recorded thereon at least 
20 1 0,000 said nucleotide sequences. 

24. A transformed cell or organism having an exogenous nucleic acid molecule which 
comprises: 

(a) a promoter region which functions in said cell to cause the production of an mRNA 
25 molecule wherein said promoter nucleic acid molecule is selected from the group 

consisting of a promoter located within SEQ ID NO: 1 through SEQ ID NO: 81306 
or complements thereof upstream of a gene having a nucleic acid sequence selected 
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from the group consisting of SEQ ID NO: 138061 through SEQ ID NO: 195836; 
which is linked to 

(b) a structural nucleic acid molecule encoding a protein or peptide; which is linked to 

(c) a 3 T non-translated nucleic acid sequence that functions in said cell to cause 

5 termination of transcription and addition of polyadenylated ribonucleotides to a 3' 

end of said mRNA molecule. 

25. A transformed cell or organism according to claim 24 which is selected from the group 
consisting of a plant cell, plant, mammalian cell, mammal, fish cell, fish, bird cell, bird, bacterial 

10 cell and fungal cell and wherein said mRNA encodes a protein in said cell. 

26. A transformed cell or organism having an exogenous nucleic acid molecule which 
comprises a structural nucleic acid sequence which expresses an mRNA which is complementary 
to and hybridizes to at least part of a nucleic acid molecule having a sequence selected from the 

15 group consisting of SEQ ID NO: 138061 through SEQ ID NO: 195836 and homologs thereof. 

27. A substantially purified oligonucleotide nucleic acid molecule comprising between about 
15 and about 100 nucleotides homologous or complementary to a nucleotide sequence within 
any of SEQ ID NO: 138061 through SEQ ID NO: 195836. 

20 

28. A oligonucleotide nucleic acid molecule according to claim 27 comprising in the range of 
18 to 50 bases, wherein from 18 to 25 of said bases are identical or complementary to an 18-25 
bp segment of sequences from a fragment of SEQ ID NO: 138061 through SEQ ID NO: 195836. 

25 29. A oligonucleotide nucleic acid molecule according to claim 28 wherein said 1 8 to 25 of 
said bases are identical or complementary to an 18-25 bp segment of sequences from a fragment 
of SEQ ID NO: 138061 through SEQ ID NO: 162749. 
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30. A oligonucleotide nucleic acid molecule according to claim 28 wherein said 18 to 25 of 
said bases are identical or complementary to an 18-25 bp segment of sequences from a fragment 
of SEQ ID NO: 162750 through SEQ ID NO: 174652. 

5 

31. A oligonucleotide nucleic acid molecule according to claim 28 wherein said 1 8 to 25 of 
said bases are identical or complementary to an 18-25 bp segment of sequences from a fragment 
of SEQ ID NO: 174653 through SEQ ID NO: 195836. 

10 32. A collection of at least 1000 non-identical oligonucleotides according to claim 27. 

33. A collection of at least 2000 non-identical oligonucleotides according to claim 27 . 

34. A collection of at least 5000 non-identical oligonucleotides according to claim 27. 

15 

35. A collection of at least 1 0,000 non-identical oligonucleotides according to claim 27 . 

36. A collection of at least 1 5,000 non-identical oligonucleotides according to claim 27. 

20 37. A collection of at least 20,000 non-identical oligonucleotides according to claim 27. 

38. A collection according to claim 32 wherein said oligonucleotides are situated in an array 
on a substrate. 

25 39. A primer pair for amplification of a nucleic acid molecule of SEQ ID NO: 138061 
through SEQ ID NO: 195836 comprising oligonucleotides according to claim 27. 
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40. A collection of purified nucleic acid molecules generated from a DNA template of the 
Arabidopsis thaliana genome using a collection of primer pairs according to claim 39, wherein 
said collection of purified nucleic acid molecules comprises at least 2 non-identical purified 
nucleic acid molecules. 

5 

41. A collection according to claim 40 wherein said purified nucleic acid molecules are 
generated by polymerase chain reaction. 

42. A collection according to claim 41 comprising at least about 1000 non-identical nucleic 
10 acid molecules. 

I 43. A collection according to claim 41 comprising at least about 2000 non-identical nucleic 

I acid molecules. 

" 1 5 44. A collection according to claim 41 comprising at least about 50000 non-identical nucleic 
\ acid molecules. 

I 45. A collection according to claim 41 comprising at least about 15,000 non-identical nucleic 

acid molecules. 

20 

^ 44. A collection according to claim 41 comprising at least about 20,000 non-identical nucleic 
acid molecules. 

(f n 45. A collection according to claim 41 comprising at least about 30,000 non-identical nucleic 
25 acid molecules. 
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46. A collection according to claim 41 wherein said purified nucleic acid molecules are 
situated in an array on a substrate. 



47. A collection of at least 3000 non-identical purified nucleic acid molecules having nucleic 
5 acid sequences selected from the group consisting of 

(a) SEQ ID NO: 138061 through SEQ ID NO: 195836; 

(b) sequences which are complementary to the nucleic acid sequences of 
group (a), wherein said purified nucleic acid molecules hybridize to 
nucleic acid molecules of the Arabidopsis thaliana genome having a 

1 o sequence of a complement of group (a) with sufficient stability to remain 

annealed to one another under at least low stringency conditions of 
washing with a salt solution having a concentration of about 0.2 sodium 
chloride/sodium citrate (SSC) at 22°C. 

(c) sequences which are homologous to the nucleic acid sequences of group 
1 5 ( a ), wherein at least 90% of said sequences are identical to homologous 

sequence of group (a). 

^ 48. A collection according to claim 47 wherein said nucleic acid molecules are located in one 
or more arrays on a substrate. 

20 

49. A collection according to claim 47 comprising at least about 1 0,000 non-identical nucleic 
acid molecules. 

50. A collection according to claim 47 comprising at least about 1 5,000 non-identical nucleic 
25 acid molecules. 
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51. A collection according to claim 47 comprising at least about 20,000 non-identical nucleic 
acid molecules. 

52. A collection according to claim 47 comprising at least about 30,000 non-identical nucleic 
acid molecules. 



C£> 53. A method for determining gene expression comprising 

(a) collecting mRNA from tissue of an organism; 

(b) using said mRNA as a template for producing a quantity of a labeled 
1 0 nucleic acid molecule; 

(c) contacting said labeled nucleic acid molecule with a collection of purified 
nucleic acid molecules according to claim 47. 

t\ 54. A method according to claim 53, wherein said purified nucleic acid molecules are 
15 capable of said determining gene expression of at least 5000 Arabidopsis thaliana genes and said 
purified nucleic acid molecules are deposited in an array on a substrate. 



55. A method according to claim 53, wherein said purified nucleic acid molecules are 
capable of said determining gene expression of at least 10,000 Arabidopsis thaliana genes and 
20 said purified nucleic acid molecules are deposited in an array on a substrate. 



56. A method according to claim 53, wherein said purified nucleic acid molecules are 
capable of said determining gene expression of at least 15,000 Arabidopsis thaliana genes and 
said purified nucleic acid molecules are deposited in an array on a substrate. 



25 
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57. A method according to claim 53, wherein said purified nucleic acid molecules are 
capable of said determining gene expression of at least 20,000 Arabidopsis thaliana genes and 
said purified nucleic acid molecules are deposited in an array on a substrate. 
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Abstract 

The present invention relates to nucleic acid sequences from the dicotyldonus plant 
Arabidopsis thaliana and, in particular, to genomic DNA sequences. The invention encompasses 
5 nucleic acid molecules present in non-coding regions as well as nucleic acid molecules that 
encode proteins and fragments of proteins. In addition, proteins and fragments of proteins so 
encoded and antibodies capable of binding the proteins are encompassed by the present 
invention. The invention also encompasses oligonucleotides including primers, e.g. useful for 
amplifying nucleic acid molecules, and collections of nucleic acid molecules and 
10 oligonucleotides, e.g. in microarrays. The invention also provides constructs and transgenic cells 
and organisms comprising nucleic acid molecules of the invention. The invention also relates to 
methods of using the disclosed nucleic acid molecules, oligonucleotides, proteins, fragments of 
proteins, and antibodies, for example, for gene identification and analysis, and preparation of 
constructs and transgenic cells and organisms. 

15 
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